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Abstract 

The aim of this work is to perform numerical simulations of the 
propagation of a laser in a plasma. At each time step, one has to solve 
a Helmholtz equation in a domain which consists in some hundreds 
of millions of cells. To solve this huge linear system, one uses a it- 
erative Krylov method with a preconditioning by a separable matrix. 
The corresponding linear system is solved with a block cyclic reduc- 
tion method. Some enlightments on the parallel implementation are 
also given. Lastly, numerical results are presented including some fea- 
tures concerning the scalability of the numerical method on a parallel 
architecture. 

1 Introduction 

The numerical simulation of propagation of high power intensity lasers in a 
plasma is of importance for the "NIP project" in USA and "LMJ Facility 
project" in Prance. It is a very challenging area for scientific computing 
indeed the laser wave length 27r/A;o is equal to a fraction of one micron 
and the simulation domain has to be much larger than 500 microns. One 
knows that in a plasma the index of refraction is equal to \/^ — A^(x), where 
A^(x) = Ne{x)/Nc and Ng is the electron plasma density at position x and 
the critical density Nc is a constant depending only on the wave length. Of 
course, the laser propagates only in the region where A^(x) < 1. In macro- 
scopic simulations (where the simulation lengths are in the order of some 



millimeters) , geometrical optics models are used and numerical solutions are 
based on ray tracing methods. To take into account more specific phenom- 
ena such as diffraction, autofocusing and filamentation, one generally uses 
models based on a paraxial approximation of the full Maxwell equations. 
This kind approximation is based on the assumption that the density A^(x) 
is close to a mean value Nav ; it allows to make an expansion ok W.K.B. type 
with a constant wave vector. At the end of section 2, we recall the paraxial 
equation ([7]) ; see for example [1], [1] in a classical framework and [5] for 
an analysis of this equation in a tilted frame and a numerical method. But, 
there are situations where the macroscopic variations of the plasma density 
A'^e are not small, particularly in the zones which are just before the critical 
density. In this zone, the laser beam undergoes a strong change of direc- 
tion near a surface called caustic surface. That is to say the wave vector is 
strongly varying near this surface, the paraxial approximation is no more 
valid and one has to deal with a model based on a frequency wave equation 
(obtained by time envelope of the solution of the full Maxwell equations). 
The model is described in the section 2. For a derivation of the models and 
a physical exposition of the phenomena under interest, see e.g. [15] or [7]. 

This paper is aiming at describing the numerical methods for solving the 
frequency wave equation and the coupling with the model for the plasma 
behavior. At each time step, one has to find the solution i/j = V'(x) of the 
following Helmholtz problem 

fcQ-^AV^ + ((1 - A) + i/i) ^ = / (1) 

where / is a given complex function and /U a positive function related to the 
absorption of the laser by the plasma. 

In this paper, only 2D problems are considered but the method may 
be extended to 3D simulations. Let us set x = {x,y) the two spatial co- 
ordinates. The key assumption is that the gradient of the density A(x) is 
parallel to the a;-axis, then we set 

N{x,y)=No{x) + 6N{x,y) (2) 

where A'o depends on the x variable only and 6N is small compared to 
1. This allows to deal with real physical situations as it is shown in the 
numerical applications below. 

The simulation domain is a rectangular box and a classical finite differ- 
ence method is used for the spatial discretization ; for an accurate solution 
it is necessary to have a spatial step equal to a fraction of the wave length. 
If Hx and Uy denote the number of discretization points in each direction, it 



leads to solve a the linear system with Uxny degrees of freedom (which may 
be in the order of 10^ for a typical 2D spatial domain). 

One chooses an iterative method of Krylov type with a preconditioning 
by a matrix corresponding to the discretization of ([T]) with N replaced by 
Nq. This leads to solve a linear system corresponding to a separable tri- 
diagonal block matrix (each block matrix), then a block cyclic 

reduction method may be used ; see for instance [inj and [1^ for this kind 
of method. The crucial point for this method is to decompose the unknown 
onto the basis of the eigenvectors of the main-diagonal block matrix. 

The paper is organized as follows. After the statement of the model 
in section 2, we present the main difficulties for the numerical simulation 
of such problems in section 3. In section 4, the numerical scheme for the 
Helmholtz solver is presented, especially the method for solving the pre- 
conditioner with the block cyclic reduction method ; some enlightments on 
the parallel implementation and on the coupling with the hydrodynamics 
part are also given. Lastly, numerical results are presented including some 
features concerning the scalability of the numerical method. 

2 Statement of the model 

Our goal is to perform simulations taking into account diffraction, refraction 
and auto-focusing phenomena and it is necessary to perform a coupling 
between the the fluid dynamics system for the plasma behavior and the 
frequency wave equation for the laser propagation (notice that the Brillouin 
parametric instabilities which create laser backscattering are not taken into 
account up to now). 

The laser beam is characterized by an electromagnetic wave with a fixed 
pulsation, so it may be modeled by the time envelope ^ = ^{t,x) of this 
electric field. It is a slowly time varying complex function. On the other 
hand, for modeling the plasma behavior one introduces the non-dimension 
electron density N = N{t,x) and the plasma velocity U = U(t,x). 

Modeling of the plasma. For the plasma, the simplest model is 
the following fluid model. Let us denote P = P{N,Te) a smooth function 
of the density and of the electron temperature Tg ( Tg is a very smooth 
given function of the spatial variable x) . Then one has to solve the following 



barotropic Euler system 



-N + ViNU) = 0, (3) 



^(ArU) + V(iVUU) + V(P(7V,re)) = -7V7pV|*|2. 



The term 7pV|V'P corresponds to a ponderomotive force due to a laser 
pressure (the coefficient jp is a constant depending only on the ion species). 

Modeling of the laser beam. Let us denote e = kQ^. The laser field 
^ = x) is a solution to the following frequency wave equation (which is 
of Schrodinger type) 

2z-:^^' + eA^ + -(l-iV)^' + ii/^' = 0, (5) 
cot e 

where the absorption coefficient u is a real coefficient related to the absorp- 
tion of the laser intensity by the plasma and c the light speed. 

Boundary conditions. The laser beam is assumed to enter into 
the domain at x = 0. Denote by e^ the unit vector in the direction of the 
incoming beam. Since the density depends mainly on the x— variable, we 
may denote by N^"' the mean value of the incoming density on the boundary 
and by A^°"* the mean value of the density on the outgoing boundary . The 
boundary condition at x = reads (with n the outwards normal to the 
boundary) 

(en. V + iK.n) (^ - a*"e^'="^^) = 0. (6) 



where K = efeVl — A^*", a™ = a*"(y) is a smooth function which is, roughly 
speaking, independent of the time. On the part of the boundary x = Xmax, 
there are two cases according to the value A^°"* : 

i) If A^°"* > the wave do not propagate up to the boundary and the 
boundary condition may read as d^/dx = 0. 

ii) If iV°"* < one has to consider a transparent boundary condition. 
Here we take the simplest one, that is to say en. 

On the other hand, on the part of the boundary corresponding to y = 
and y = Umax, it is crucial to have a good transparent boundary condition, 
so we introduce perfectly matched layers (the P.M.L. of [3)- For the simple 
equation — — Lo'^tp = /, this technique amounts to replace in the neigh- 
borhood of the boundary, the operator ^ by ( 1 + ) where o" is a 



dy '"■^ i-^; 

damping function which is not zero only on two or three wave lengths and 
which increases very fast up to the boundary. Notice that the feature of this 



method is that it is necessary to modify the discretization of the Laplace 
operator on a small zone near the boundaries. 

The paraxial equation. For the sake of completeness, we recall now 
the paraxial approximation equation which is valid only if the plasma density 
is a very smooth function, in such a way that we can take Nq = Ni^-y where 
A'^a.v in a constant. So we can define a mean wave vector K = ei,\/l — Nav 
and the laser beam is now characterized by the space and time envelope of 
the electric field E = E{t,x), that is to say 



The envelope E is assumed to be slowly varying with the space variable and 
thus satisfies 



where denotes the Laplace operator in the hyperplane transverse to K. 
It is necessary to supplement equation ([7]) with a boundary condition on the 
incoming boundary which is E{0, .) = a*" (and an initial condition). See 



The discretization and the solving of the above system of partial differential 
equations is very challenging since different space scales are to be considered. 
On the other hand, it leads to very large linear system to solve. 

3.1 Multiscale in space 

For solving ([5]), the spatial mesh has to be very fine, hHeimhoitz — Aq/IO 
or less each direction (recall that Aq = 2-11 /ko) ; this mesh is called in the 
sequel the Helmholtz grid. But the modulus |V'| of the electric field is slowly 
varying with respect to the spatial variable, thus one can use a coarse mesh 
for the simulation of the Euler system, typically one can set h fluid — Ao/2. 

For the numerical solution of the fluid system, we refer to the method 
described in or [1] which has been implemented in a parallel platform 
called HERA, the ponderomotive force is taken into account by adding the 
ponderomotive force proportional to VIV'P to the pressure force. The plasma 
density A^, the velocity U and the laser intensity \Tp\'^ are evaluated at the 
center of each cell. 



^'(t,x) = E{t,ii)e 



,iK.x/£ 




II] , IS]. 



3 Difficulties 



So we handle a two-level mesh of finite difference type : in a 2D sim- 
ulation, each cell of the fluid system is divided into po x pQ cells for the 
Helmholtz level, with po = 5 or more. At each time step 6t determined 
by the CFL criterion for the Euler system, one has to solve the frequency 
wave equation ([S]). For the time discretization of this equation, an implicit 
scheme is used. The length c5t is very large compared to the spatial step 
therefore the time derivative term may be considered as a perturbation. So, 
at each time step, if V'*"* denotes the value of the solution at the beginning 
of the time step, one has to find ijj solution of the following equation of the 
Helmholtz type 

e2AV + ((l-iV) + i/ii)V = i/UoV'™ (8) 

where = e2/{c5t), fii = e{2/{c6t) + 1^). This equation is supplemented by 
a boundary condition at x = 

e|- + iK,)^ = + iK,)(a-e^K-J'A). (9) 
ox ox 

and another one in x = Xmax as above. 



3.2 Large Scale Problem 



As we shall see, the main difficulty comes from the Helmholtz equation: 
the number of unknowns is quite large and the properties of the resulting 
linear system makes it hard to solve. The linear system is symmetric but 
not Hermitian ; these properties are inherited by the discretized equations. 
The resulting linear systems are thus difficult to precondition. Concerning 
the preconditioning, the hypothesis ([2]) leads to replaced the original system 
by another one which is simpler since it does not take into account the 
perturbation 5N(x,y). The corresponding linear system to be solved leads 
to a five-diagonal symmetric non-hermitian matrix Aq 
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(10) 



where T is equal to a constant times the identity matrix of dimension n^, 
the matrix A of dimension and /3 is a complex constant. The matrix Aq 
is separable and therefore a block cyclic reduction method may be used for 
its numerical solution. 



Notice that for a realistic simulation where the sizes of the domain is 
500 iim times 700 fim and the wave length equal to 0.35 fim, it leads to 12 
millions fluid unknowns and 300 millions unknowns on the Helmholtz grid . 



4 Numerical Strategies 
4.1 Helmholtz solver 

Due to the size of the problem, a direct solver (even a parallel one) can not 
be used. We have to use an iterative method, in our case a preconditioned 
Krylov solver. As for the preconditioner, it seems difficult to propose one 
which would be valid booth in the central zone where a pure Helmholtz equa- 
tion is considered and in the Perfectly Matched Layers (PML). Therefore, 
we first make a decomposition of the domain into three subdomains: two 
thin PML layers and a large central domain with appropriate interface con- 
ditions, see § 14.1.11 The preconditioning of the central problem is presented 
in § 14.1.21 it corresponds to the solution to an approximated equation. 

Let us mention right away that a multigrid method for the Helmholtz 
equation (jlip would not work. Indeed, multigrid methods are efficient only 
if a large enough damping parameter is present in the equation. Here, the 
coefficient c6t is larger than 100 wave lengths and the term u is typically 
given by the formula 

zy = i^cNo{xf 

with l/uc ill the order of 15 fim (typical values of the density are around 
0.4). So the damping term ko^i = /co(/io + 1^) is quite small when compared 
to the wave length iTrk^^ and it is too weak for a multigrid method. 

To solve the linear system arising from the discretization of the Helmholtz 
equation 

e^A^ + i/iiV- + (1 - No{x))^l; - 6n{x, y)V' = i/UoV''™, (H) 

we use the fact that the function Nq depends only of a one-dimension vari- 
able and we deal with a Krylov method with a preconditioner which cor- 
responds to a separable matrix. This preconditioner may be interpreted as 
the discretization of the operator 

iP ^ e^AiP + ifio^ + {l- No{x))ip. (12) 

with the same boundary conditions. A block cyclic reduction method is then 
used for solving the corresponding linear system. 




Figure 1: Domain decomposition into three overlapping subdomains 



Let us mention that the idea of preconditioning a variable coefficient 
Helmholtz equation by a problem amenable to the separation of variables 
technique was investigated in |12j in the context of seismic modeling. Al- 
though in this context the method was not satisfactory, we shall see that 
in our context (laser-plasma interaction), results are indeed very good. We 
mention as well another method [6] based on the preconditioning of (jlip by 
a shifted Helmholtz equation with Pq ~ 0.3 — 0.5 

e2AV + i/ii^ + (l + i/?o)[(l-iV)^] 
that would be amenable to a multigrid solver. 



4.1.1 Domain Decomposition 

The computational domain is divided into three overlapping subdomains: a 
purely Helmholtz zone Qc and two zones bordering it above and below 
Ofe, see fig. [H In Qt and 0^, we have both a PML and a Helmholtz region. 

The coupling between the subdomains is made via Robin interface con- 
ditions, see [9] and [2]. So solving of equation ([8]) leads to consider the 
following coupled system of equations, where the unknown functions are 

'>Pt,il^c,4'b 
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ipb + i/iiV'b + (1 - No)iJb = 



dy dy 
The coupling interface condition are of Robin type with a 
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0.5/e. The 

boundary condition at x = is given ([9]) for ipc and by (e^ + iKx)'4' = ^or 
ipt and ipb- These equations are discretized by a finite difference scheme. Let 
us denote by ^'f, and the corresponding unknown vector in domains 
Vtt, and Ofe. The hnear system to be solved reads : 





where M 



Api Ci 
C2 Ah C3 
Ci Ap2 



(13) 



The blocks (Cj)i<i<4 are related to the discrete Robin interface conditions. 



4.1.2 The matrix system 

Denote by A^n the diagonal matrix corresponding to the discretization of 
the operator of multiplication by —6n + ^ifJ-i — Mo) and by Aq the one 
corresponding to the discretization of equation (I12p . that is to say Ah = 
Ag + then the matrix M may be decomposed as 



M = Md + Me, 



with 



Api 

Ap2 



and Me 



Ci 
C2 AsN C's 
C4 



The principle of solving the linear system (I13p is to use a Krylov method 
preconditioned by Mp, which is a block diagonal matrix ; here we choose 
a GMRES algorithm (without restarting since the number of iterations is 
quite low, see below section 5.2). To apply the preconditioner, since matrices 
Api, Ap2 are small ones, they can be factorized by a direct method and 



from the computational point of view, the key step is to use a fast solver for 
the matrix Aq- 

Let us describe the structure of the matrix Aq coming from the dis- 
cretization of equation p^ . Denote by Sy is the space step in y, by I the 
identity matrix of dimension n^- Let the symmetric tridiagonal matrix 

which corresponds to the discretization of the following ID problem 



i; + (1 - Noix))i^, 



with the boundary condition Their coefficients are real except the one 
in the first line and the first column (due to the boundary condition). Then 
we set 
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T - 
B 



2e^ 

^0 + «/^oI - 



5y^ 



ae 



—^1 = A + (31 
dy by" 



where (3 = —iae^/5y + ^jby^ ■ Now, if we denote = iu\^U2^ ■■■Uny) 
and / = (/i, /2, ■■■friy) where the elements Um and fm are n^;— vectors, the 
system Aq'^c = f reads as follows 
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4.1.3 Cyclic Reduction 

In order to solve system (|14p . we use the block cyclic reduction method. Let 
us recall the principle of this method, assuming that = 2'^ — 1 for the 
sake of simplicity. We know that A and T are commutative. Consider 3 
successive lines of (fn|) for i = 2, 4, — 1 : 

-Tui-2 + Aui-i - Tui = 
< - Tui-i + Aui - Tui+i = fi (15) 

- Tui + Aui+i - Tui+2 = fi+i- 



After a linear combination of these lines, we get : 



- T^A~^Ui.2 + {A- 2T^A~') - T^A~^Ui+2 = fi + TA~' + 

(16) 

After this first step, the elimination procedure may be performed again 
by induction. That is to say, denote = A, = B, T(°) = T et 

j(o) = J ; after r elimination steps, the reduced system for < r < A; — 1 
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(17) 



Ar~l) , Ar~l) 

■Ji 2r_yr-1 ' J i,2r + 2^^'^ 



(18) 



After all the elimination steps, it remains only one equation for finding 
U2k-i. Once this value is obtained, one deduces all the other values step by 
step recursively. 

From a practical point of view, one has to perform these computations 
in the spectral basis of the eigenvectors of the matrix A, which are of course 
also eigenvectors of T, B, Air\ T^^) ^ ij(^) for all r. 



4.2 Parallel Implementation 

The implementation of the method has been made in the HERA platform, 
see [I], [8]. For the Helmholtz solver, one first have to find a orthonormal 
basis of eigenvectors of Aq. As a matter of fact, even if the matrix is not 
exactly real, we search a set of eigenvectors which are orthogonal for the 



pseudo scalar product < u,v >= u'^ ■ v. To do this, we use the "new QD" al- 
gorithm of Par Lett (cf. [11]) although it was designed for Hermitian matrices. 
We conjecture that it is always possible to find such a basis of eigenvectors 
for our class of matrices. The only difficulty would be to find a non zero 
eigenvector v such that v"^ -v = 0, but in practice, we never encountered any 
problem by using the method. In the method proposed in [11], which follows 
an idea of [H] , the computation of the eigenvalues is based on a series of LU 
factorization of tridiagonal matrices. This step is sequential but very cheap 
in terms of memory and CPU time requirements especially when compared 
to the QR algorithm which would manipulate full matrices. Once the eigen- 
values have been computed, the computation of the eigenvectors consists in 
finding the kernel of tridiagonal matrices. This task is distributed among 
the processors and is thus parallel. In our tests, this method was 40 times 
faster than the QR method. 

So let us denote Q the matrix whose columns are the eigenvectors of . 
The matrix Q is orthonormal for the pseudo scalar product, that is to say 

QQ^ = Q^Q = I 

Since T is the identity matrix up to a multiplicative constant, one can 
introduce the diagonal matrices A^'^) and P^") 

^ = QA(°)Q^, T = Qp(°)g^. (19) 

So we get 

= qa(^')q^, = gr(^)g^ (20) 

with the following induction formulas 



(211 



yvw = \ir-i) _ 2 (^r(^-i)j (^a(^-i)) , pw = (^p(^-i)j (^A(^-i)y 

Let us summarize the algorithm 

• Introduce the vectors /j transformed of /j in the eigenvector basis 

fi = Q^fi for i = l,...,ny. 

• At each step r, the vector transformed of /''j of the right hand side. 



reads 



— J y i.2^~2^-^ ^ J i.2^+2^-^ ) 



• One computes the vectors U2k-i by solving 

A(fc-l),--, _ fC^-l) 
'U^k-i — /2fc-i 

• One recursively distributes the solutions by solving sub-systems of the 
following type 

\(r)~ _ ~{r) 

ii Uj 2^+1-2'' — ffj.2'+l-2^ 

where 

5j^2r+l„2'- = fj^2^+i-2r- +r*-^'' (?i(j-l).2'-+l + ^'(i).2'-+i) 

• Lastly, the solution u is given by 

Ui = Qui for i = 1, . . . , Uy. 

For the parallel implementation, the processors are shared out according 
to horizontal slabs in a balanced way. Let us note that the first and last 
processors contain PML layers and some lines of the central grid. The 
crucial point of the algorithm is the multiplication of a full matrix Q (and its 
transpose) of dimension rix x by the set of Uy vectors, i.e. a matrix-matrix 
multiplication. Within a realistic framework for Symmetric Multiprocessors 
(SMP) architecture, the matrix Q of a size of several giga octets can be 
contained only in the memory of the nodes of processors. This involves 
us to use a technique of hybrid parallelization of type MPI-multithreading. 
One wishes to profit from the locality of the data between processors of the 
same node and to use MPI for the others communications between nodes. 
Cutting is carried out in order to avoid conflict of writing of the threads. 
For example, shearing in Ng x N^, threads leads to 
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Each product QiBj is carried out by a call with the BLAS library. Let 
us specify that the steps of computation in the spectral basis {i.e. A^^'\ 
T^^'\ f-'^2r----) are of a very negligible cost in comparison of matrix- matrix 
products. Our tests show a good scalability of the cyclic reduction and a 
great speed of execution. 



4.3 Coupling with the hydrodynamics part 

Since the laser intensity IV'P is slowly varying according to the space variable, 
one can deal with the ponderomotive force and the hydrodynamic equations 
on the coarse grid (whose size is equal to one half of the wave length). To 
evaluate the laser intensity l^/'p on this coarse grid, it suffices to take the 
mean value of \ip\'^ on the fine grid. To get the electron density, i.e. A'o and 
Jat, we use a linear interpolation between the coarse grid and the fine grid. 
Of course in the PML zone, one do not evaluate the ponderomotive force. 

5 Numerical Results 

5.1 Test cases 

The boundary value a^"" has to mimic laser beam ; to be realistic the profile 
of a*" corresponds to a juxtaposition of a lot of small hot spots , called 
speckles whose intensity is very high compared to the mean intensity of the 
beam. The shape of a speckle is generally a Gaussian function whose width 
is about a few micrometers. 

One considers here a simulation domain of 100 x 300 wave lengths ; the 
initial profile of density is a linear function increasing from 0.1 at x = 
to 1.1 at a; = Xmax- The profile of a^^ contains only three speckles At the 
Helmholtz level, one handles only 3 millions of cells. With 32 PEs (of the 
type EV67 HP-Compaq), the CPU time is only a few minutes per time step 
with approximatively 10 Krylov iterations at each time step. 

Without the coupling with the plasma, it is well known that the solution 
is very close to the one given by Geometrical Optics and corresponds to 
parallel speckles or beamlets which are curved and tangent to a caustic line 
(this line corresponds to x = x^, such that A'o(xv,) = cos^(0), where 9 is the 
incidence angle of the beamlets where they enter into the simulation box). 
With our model, if the laser intensity is small (which corresponds to a weak 
coupling with the plasma), one notices that a small digging of the plasma 
density occurs. This digging is more significant when the laser intensity 
is larger, then an autofocusing phenomenon takes place. On fig. [21 one 
sees the map of the laser intensity that is to say the quantity IV'P) which 
corresponds to this situation after some time steps. We notice here that 
the three beamlets undergo autofocusing phenomena and something like a 
filamentation may be observed. 

Another case will be presented, corresponding to simulation domain of 
700 X 1200 wave lengths. At the Helmholtz level, one handles 84 millions 



Figure 2: Laser intensity of a schematic beamlet 



of cells and the simulation have run on 128 PEs. The map of the laser 
intensity is shown on fig. [3] after 6 picosecond (corresponding to about 15 
time steps). We have set to zero the absorption coefficient in order to have 
a sharp problem. The caustic lines corresponds to about A^o(3^*) = 0.5. Here 
the digging of the plasma is locally very important since the variation of 
density 6N reaches 0.07 in a region where No{x) = 0.45. 

5.2 Numerical performance 

We focus on the solving of the very large system (jl3p arising from the 
discretization of the Helmholtz equation by the preconditioned GMRES 
method presented above. In fig. HI we plot the iteration counts as a func- 
tion of the physical time. As time increases, the density fluctuation 6n gets 
larger and it is necessary to perform more iterations of the GMRES method. 
Nevertheless, we don't have more than 20 iterations. 

Let us address now the scalability of our computation method. So we 
consider a fixed size problem of 40 million unknowns and we increase the 
number of processors. Table [1] gives the elapsed time of one GMRES iter- 
ation. Due to the good parallel properties of the cyclic reduction, we see 
that the speed up is almost perfect. When now the number of unknowns 
increases, one can easily check that the computational effort grows propor- 
tionally to TL^Uy. In Table El from one column to the next, the number of 



Figure 3: Laser intensity of a beamlet in a plasma after 6 ps without ab- 
sorption. Notice the autofocusing of the beamlet near the caustic. The x 
axis is here vertical. 
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Figure 4: Number of GMRES iterations versus time. 



Nb Procs 


16 


32 


64 


128 


elapsed time per GMRES it. 


492s 


249s 


126s 


64s 


Efficiency GMRES algo. 


1 


0.987 


0.976 


0.96 



Table 1: Scalability for a fixed size problem. 



Nb Procs 


1 


4 


16 


64 


256 


# d.o.f. X 10*^ 


0.4 


1.6 


6.3 


25.4 


101.6 


elapsed time for QD algo. 


Is 


3s 


12s 


48s 


189s 


elapsed time per GMRES it. 


4.8s 


11.6s 


24s 


47s 


93s 



Table 2: Scalability for problems of increasing sizes. 



points is doubled in each direction so the CPU time is 8 times larger. Since 
the number of processors is multiplied by four, we check that the elapsed 
time is about two times larger. These two tables show that the paralleliza- 
tion of the cyclic reduction method work very well. 

5.3 A more realistic case 

In the realistic configurations, it may be useful to solve the paraxial equa- 
tions on a part of the simulation domain in which this approximation is 
valid and the Helmholtz equations on the remainder where the validity of 
this approximation is no more true. The mesh size for to numerical solution 
of the paraxial equation is the same as for the fluid system (see [8] , [4] ) . The 
coupling between the paraxial and Helmholtz parts is performed according 
to the classical boundary condition for the Helmholtz equation ([9]) with a*" 
replaced by E°'^^, which is the value of the solution to the paraxial equation 
at the interface boundary. The advantage of the paraxial equation is that it 
can be solved by a marching method in space where only ID systems have 
to be solved at each vertical line of unknowns ; see 0], [5]. 

Here we have performed a simulation with an initial density which is 
equal to 0.15 up the third of the simulation domain and which ranges linearly 
up to 1 at X = Xmax, the boundary value a*"" mimics a multispeckle laser 
beam. We use the paraxial model in the third of the simulation domain and 
the frequency wave equation in the complementary part, then we have much 
less unknowns to deal with for the Helmholtz problem. In this simulation, 
the computational domain size was 2000 x 2000 wave lengths. There were 200 
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Figure 5: Laser intensity for a multiple beams laser for a plasma density 
ranging up to 1. 

million cells in the Helmholtz zone and 4 millions unknowns for the paraxial 
zone The hydrodynamics equations are solved on a domain which consists 
in 16 million cells. The computation was performed using 256 processors. 
The physical time of the simulation is equal to 11 ps. The elapsed time for 
the full simulation was 8 hours. On fig. [6l the map of the laser intensity is 
represented at the end of simulation. 

Lastly, we show on fig. O a zoom of the laser intensity near the caustic 
line in a numerical simulation of the same type than the previous one, except 
that the absorption coefficient is small but not zero. One can notice the great 
accuracy of simulation which shows interference patterns of the speckles. 

6 Conclusion and Prospects 

In the framework of the hydrodynamics parallel platform HERA, we have 
developed a solver for the laser propagation based on the Helmholtz equa- 
tion that can handle realistic computations on very large computational 
domains. The Helmholtz zone is coupled with a paraxial zone and a fluid 
plasma model. The assumption that the initial density depends mainly 




Figure 6: Zoom near the caustic line of the Laser intensity for a multi-speckle 
beam. 



on the x— variable only allows to perform a preconditioning by a domain 
decomposition method (two PMLs and a large Helmholtz zone) where the 
linear system corresponding to the Helmholtz zone with a matrix Aq may 
be solved efficiently by the block cyclic reduction method. Most of the com- 
puter time is spent by applying a dense x matrix Q to a set of Uy 
vectors. We can thus achieve a very good scalability w.r.t to the number 
of unknowns in the y direction. According to an increase of the fluctuation 
of the density, we notice an increase in the number of GMRES iterations 
per time step as the physical time increases, but this number remains small 
enough to have an acceptable CPU time. 

In the future some CPU time may be saved by using a domain decompo- 
sition in the large central Helmholtz zone. For instance, simply dividing the 
Helmholtz zone in two vertical subdomains would decrease the size of the 
matrices Q by a factor 4. Moreover the use of local (and thus more accu- 
rate) averages for the density in the preconditioner could break the increase 
in the number of iterations as the time increases. Another interesting strat- 
egy could be to not consider inside the inner iteration loop of the Krylov 
method all the spatial domain that is to say all the Uy vectors but only the 
vectors which does not belong to some subinterval [ny^riy] for instance the 
ones where the solution varies very few from an iteration to the other. 
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